Search CORE

Bayesian distance metric learning and its application in automatic speaker recognition systems

Author: Singh Satyanand
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/08/2019
Field of study

This paper proposes state-of the-art Automatic Speaker Recognition System (ASR) based on Bayesian Distance Learning Metric as a feature extractor. In this modeling, I explored the constraints of the distance between modified and simplified i-vector pairs by the same speaker and different speakers. An approximation of the distance metric is used as a weighted covariance matrix from the higher eigenvectors of the covariance matrix, which is used to estimate the posterior distribution of the metric distance. Given a speaker tag, I select the data pair of the different speakers with the highest cosine score to form a set of speaker constraints. This collection captures the most discriminating variability between the speakers in the training data. This Bayesian distance learning approach achieves better performance than the most advanced methods. Furthermore, this method is insensitive to normalization compared to cosine scores. This method is very effective in the case of limited training data. The modified supervised i-vector based ASR system is evaluated on the NIST SRE 2008 database. The best performance of the combined cosine score EER 1.767% obtained using LDA200 + NCA200 + LDA200, and the best performance of Bayes_dml EER 1.775% obtained using LDA200 + NCA200 + LDA100. Bayesian_dml overcomes the combined norm of cosine scores and is the best result of the short2-short3 condition report for NIST SRE 2008 data

Crossref

High level speaker specific features modeling in automatic speaker recognition system

Author: Singh Satyanand
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/04/2020
Field of study

Spoken words convey several levels of information. At the primary level, the speech conveys words or spoken messages, but at the secondary level, the speech also reveals information about the speakers. This work is based on the high-level speaker-specific features on statistical speaker modeling techniques that express the characteristic sound of the human voice. Using Hidden Markov model (HMM), Gaussian mixture model (GMM), and Linear Discriminant Analysis (LDA) models build Automatic Speaker Recognition (ASR) system that are computational inexpensive can recognize speakers regardless of what is said. The performance of the ASR system is evaluated for clear speech to a wide range of speech quality using a standard TIMIT speech corpus. The ASR efficiency of HMM, GMM, and LDA based modeling technique are 98.8%, 99.1%, and 98.6% and Equal Error Rate (EER) is 4.5%, 4.4% and 4.55% respectively. The EER improvement of GMM modeling technique based ASR systemcompared with HMM and LDA is 4.25% and 8.51% respectively

Emerging Science Journal (ESJ)

Environmental Energy Harvesting Techniques to Power Standalone IoT-Equipped Sensor and Its Application in 5G Communication

Author: Singh Satyanand
Publication venue: 'Ital Publication'
Publication date: 15/11/2021
Field of study

In the recent few years, due to its significant deployment to meet global demand for smart cities, the Internet of Things (IoT) has gained a lot of attention. Environment energy harvesting devices, which use ambient energy to generate electricity, could be a viable option in near future for charging or powering stand-alone IoT sensors and electronic devices. The key advantages of such energy harvesting gadgets are that they are environmentally friendly, portable, wireless, cost-effective, and compact. It is significant to propos and fabricate an improved, high-quality, economical, and efficient energy harvesting systems to overcome power supply to tiny IoT devices at the remote locations. In this article, various types of mechanisms for harvesting renewable energies that can power sensor enabled IoT locally, as well as its associated wireless sensor networks (WSNs), are reviewed. These methods are discussed in terms of their advantages and applications, as well as their drawbacks and limitations. Furthermore, methodological performance analysis for the decade 2005 to 2020 is surveyed in order to identify the methods that delivered high output power for each device. Furthermore, the outstanding breakthrough performances of each of the aforementioned micro-power generators during this time period are emphasized. According to the research, thermoelectric modules can convert up to 2500×10^(-3) W/cm^2, thermo-photovoltaic 10.9%, piezoelectric 10,000 mW/cm^3 and microbial fuel cell 6.86 W/m^2 of energy. Doi: 10.28991/esj-2021-SP1-08 Full Text: PD

The role of speech technology in biometrics, forensics and man-machine interface

Author: Singh Satyanand
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/02/2019
Field of study

Day by day Optimism is growing that in the near future our society will witness the Man-Machine Interface (MMI) using voice technology. Computer manufacturers are building voice recognition sub-systems in their new product lines. Although, speech technology based MMI technique is widely used before, needs to gather and apply the deep knowledge of spoken language and performance during the electronic machine-based interaction. Biometric recognition refers to a system that is able to identify individuals based on their own behavior and biological characteristics. Fingerprint success in forensic science and law enforcement applications with growing concerns relating to border control, banking access fraud, machine access control and IT security, there has been great interest in the use of fingerprints and other biological symptoms for the automatic recognition. It is not surprising to see that the application of biometric systems is playing an important role in all areas of our society. Biometric applications include access to smartphone security, mobile payment, the international border, national citizen register and reserve facilities. The use of MMI by speech technology, which includes automated speech/speaker recognition and natural language processing, has the significant impact on all existing businesses based on personal computer applications. With the help of powerful and affordable microprocessors and artificial intelligence algorithms, the human being can talk to the machine to drive and control all computer-based applications. Today's applications show a small preview of a rich future for MMI based on voice technology, which will ultimately replace the keyboard and mouse with the microphone for easy access and make the machine more intelligent

High Level Speaker Specific Features as an Efficiency Enhancing Parameters in Speaker Recognition System

Author: Singh Satyanand
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/08/2019
Field of study

In this paper, I present high-level speaker specific feature extraction considering intonation, linguistics rhythm, linguistics stress, prosodic features directly from speech signals. I assume that the rhythm is related to language units such as syllables and appears as changes in measurable parameters such as fundamental frequency ( ), duration, and energy. In this work, the syllable type features are selected as the basic unit for expressing the prosodic features. The approximate segmentation of continuous speech to syllable units is achieved by automatically locating the vowel starting point. The knowledge of high-level speaker’s specific speakers is used as a reference for extracting the prosodic features of the speech signal. High-level speaker-specific features extracted using this method may be useful in applications such as speaker recognition where explicit phoneme/syllable boundaries are not readily available. The efficiency of the particular characteristics of the specific features used for automatic speaker recognition was evaluated on TIMIT and HTIMIT corpora initially sampled in the TIMIT at 16 kHz to 8 kHz. In summary, the experiment, the basic discriminating system, and the HMM system are formed on TIMIT corpus with a set of 48 phonemes. Proposed ASR system shows 1.99%, 2.10%, 2.16% and 2.19 % of efficiency improvements compared to traditional ASR system for and of 16KHz TIMIT utterances